Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 18.104
1.
Sci Rep ; 14(1): 10560, 2024 05 08.
Article En | MEDLINE | ID: mdl-38720020

The research on video analytics especially in the area of human behavior recognition has become increasingly popular recently. It is widely applied in virtual reality, video surveillance, and video retrieval. With the advancement of deep learning algorithms and computer hardware, the conventional two-dimensional convolution technique for training video models has been replaced by three-dimensional convolution, which enables the extraction of spatio-temporal features. Specifically, the use of 3D convolution in human behavior recognition has been the subject of growing interest. However, the increased dimensionality has led to challenges such as the dramatic increase in the number of parameters, increased time complexity, and a strong dependence on GPUs for effective spatio-temporal feature extraction. The training speed can be considerably slow without the support of powerful GPU hardware. To address these issues, this study proposes an Adaptive Time Compression (ATC) module. Functioning as an independent component, ATC can be seamlessly integrated into existing architectures and achieves data compression by eliminating redundant frames within video data. The ATC module effectively reduces GPU computing load and time complexity with negligible loss of accuracy, thereby facilitating real-time human behavior recognition.


Algorithms , Data Compression , Video Recording , Humans , Data Compression/methods , Human Activities , Deep Learning , Image Processing, Computer-Assisted/methods , Pattern Recognition, Automated/methods
2.
PLoS One ; 19(5): e0298373, 2024.
Article En | MEDLINE | ID: mdl-38691542

Pulse repetition interval modulation (PRIM) is integral to radar identification in modern electronic support measure (ESM) and electronic intelligence (ELINT) systems. Various distortions, including missing pulses, spurious pulses, unintended jitters, and noise from radar antenna scans, often hinder the accurate recognition of PRIM. This research introduces a novel three-stage approach for PRIM recognition, emphasizing the innovative use of PRI sound. A transfer learning-aided deep convolutional neural network (DCNN) is initially used for feature extraction. This is followed by an extreme learning machine (ELM) for real-time PRIM classification. Finally, a gray wolf optimizer (GWO) refines the network's robustness. To evaluate the proposed method, we develop a real experimental dataset consisting of sound of six common PRI patterns. We utilized eight pre-trained DCNN architectures for evaluation, with VGG16 and ResNet50V2 notably achieving recognition accuracies of 97.53% and 96.92%. Integrating ELM and GWO further optimized the accuracy rates to 98.80% and 97.58. This research advances radar identification by offering an enhanced method for PRIM recognition, emphasizing the potential of PRI sound to address real-world distortions in ESM and ELINT systems.


Deep Learning , Neural Networks, Computer , Sound , Radar , Algorithms , Pattern Recognition, Automated/methods
3.
Sensors (Basel) ; 24(9)2024 Apr 24.
Article En | MEDLINE | ID: mdl-38732808

Currently, surface EMG signals have a wide range of applications in human-computer interaction systems. However, selecting features for gesture recognition models based on traditional machine learning can be challenging and may not yield satisfactory results. Considering the strong nonlinear generalization ability of neural networks, this paper proposes a two-stream residual network model with an attention mechanism for gesture recognition. One branch processes surface EMG signals, while the other processes hand acceleration signals. Segmented networks are utilized to fully extract the physiological and kinematic features of the hand. To enhance the model's capacity to learn crucial information, we introduce an attention mechanism after global average pooling. This mechanism strengthens relevant features and weakens irrelevant ones. Finally, the deep features obtained from the two branches of learning are fused to further improve the accuracy of multi-gesture recognition. The experiments conducted on the NinaPro DB2 public dataset resulted in a recognition accuracy of 88.25% for 49 gestures. This demonstrates that our network model can effectively capture gesture features, enhancing accuracy and robustness across various gestures. This approach to multi-source information fusion is expected to provide more accurate and real-time commands for exoskeleton robots and myoelectric prosthetic control systems, thereby enhancing the user experience and the naturalness of robot operation.


Electromyography , Gestures , Neural Networks, Computer , Humans , Electromyography/methods , Signal Processing, Computer-Assisted , Pattern Recognition, Automated/methods , Acceleration , Algorithms , Hand/physiology , Machine Learning , Biomechanical Phenomena/physiology
4.
Sensors (Basel) ; 24(9)2024 Apr 25.
Article En | MEDLINE | ID: mdl-38732843

As the number of electronic gadgets in our daily lives is increasing and most of them require some kind of human interaction, this demands innovative, convenient input methods. There are limitations to state-of-the-art (SotA) ultrasound-based hand gesture recognition (HGR) systems in terms of robustness and accuracy. This research presents a novel machine learning (ML)-based end-to-end solution for hand gesture recognition with low-cost micro-electromechanical (MEMS) system ultrasonic transducers. In contrast to prior methods, our ML model processes the raw echo samples directly instead of using pre-processed data. Consequently, the processing flow presented in this work leaves it to the ML model to extract the important information from the echo data. The success of this approach is demonstrated as follows. Four MEMS ultrasonic transducers are placed in three different geometrical arrangements. For each arrangement, different types of ML models are optimized and benchmarked on datasets acquired with the presented custom hardware (HW): convolutional neural networks (CNNs), gated recurrent units (GRUs), long short-term memory (LSTM), vision transformer (ViT), and cross-attention multi-scale vision transformer (CrossViT). The three last-mentioned ML models reached more than 88% accuracy. The most important innovation described in this research paper is that we were able to demonstrate that little pre-processing is necessary to obtain high accuracy in ultrasonic HGR for several arrangements of cost-effective and low-power MEMS ultrasonic transducer arrays. Even the computationally intensive Fourier transform can be omitted. The presented approach is further compared to HGR systems using other sensor types such as vision, WiFi, radar, and state-of-the-art ultrasound-based HGR systems. Direct processing of the sensor signals by a compact model makes ultrasonic hand gesture recognition a true low-cost and power-efficient input method.


Gestures , Hand , Machine Learning , Neural Networks, Computer , Humans , Hand/physiology , Pattern Recognition, Automated/methods , Ultrasonography/methods , Ultrasonography/instrumentation , Ultrasonics/instrumentation , Algorithms
5.
Sensors (Basel) ; 24(9)2024 Apr 25.
Article En | MEDLINE | ID: mdl-38732846

Brain-computer interfaces (BCIs) allow information to be transmitted directly from the human brain to a computer, enhancing the ability of human brain activity to interact with the environment. In particular, BCI-based control systems are highly desirable because they can control equipment used by people with disabilities, such as wheelchairs and prosthetic legs. BCIs make use of electroencephalograms (EEGs) to decode the human brain's status. This paper presents an EEG-based facial gesture recognition method based on a self-organizing map (SOM). The proposed facial gesture recognition uses α, ß, and θ power bands of the EEG signals as the features of the gesture. The SOM-Hebb classifier is utilized to classify the feature vectors. We utilized the proposed method to develop an online facial gesture recognition system. The facial gestures were defined by combining facial movements that are easy to detect in EEG signals. The recognition accuracy of the system was examined through experiments. The recognition accuracy of the system ranged from 76.90% to 97.57% depending on the number of gestures recognized. The lowest accuracy (76.90%) occurred when recognizing seven gestures, though this is still quite accurate when compared to other EEG-based recognition systems. The implemented online recognition system was developed using MATLAB, and the system took 5.7 s to complete the recognition flow.


Brain-Computer Interfaces , Electroencephalography , Gestures , Humans , Electroencephalography/methods , Face/physiology , Algorithms , Pattern Recognition, Automated/methods , Signal Processing, Computer-Assisted , Brain/physiology , Male
6.
Sensors (Basel) ; 24(9)2024 May 05.
Article En | MEDLINE | ID: mdl-38733038

With the continuous advancement of autonomous driving and monitoring technologies, there is increasing attention on non-intrusive target monitoring and recognition. This paper proposes an ArcFace SE-attention model-agnostic meta-learning approach (AS-MAML) by integrating attention mechanisms into residual networks for pedestrian gait recognition using frequency-modulated continuous-wave (FMCW) millimeter-wave radar through meta-learning. We enhance the feature extraction capability of the base network using channel attention mechanisms and integrate the additive angular margin loss function (ArcFace loss) into the inner loop of MAML to constrain inner loop optimization and improve radar discrimination. Then, this network is used to classify small-sample micro-Doppler images obtained from millimeter-wave radar as the data source for pose recognition. Experimental tests were conducted on pose estimation and image classification tasks. The results demonstrate significant detection and recognition performance, with an accuracy of 94.5%, accompanied by a 95% confidence interval. Additionally, on the open-source dataset DIAT-µRadHAR, which is specially processed to increase classification difficulty, the network achieves a classification accuracy of 85.9%.


Pedestrians , Radar , Humans , Algorithms , Gait/physiology , Pattern Recognition, Automated/methods , Machine Learning
7.
Sensors (Basel) ; 24(8)2024 Apr 10.
Article En | MEDLINE | ID: mdl-38676024

In recent decades, technological advancements have transformed the industry, highlighting the efficiency of automation and safety. The integration of augmented reality (AR) and gesture recognition has emerged as an innovative approach to create interactive environments for industrial equipment. Gesture recognition enhances AR applications by allowing intuitive interactions. This study presents a web-based architecture for the integration of AR and gesture recognition, designed to interact with industrial equipment. Emphasizing hardware-agnostic compatibility, the proposed structure offers an intuitive interaction with equipment control systems through natural gestures. Experimental validation, conducted using Google Glass, demonstrated the practical viability and potential of this approach in industrial operations. The development focused on optimizing the system's software and implementing techniques such as normalization, clamping, conversion, and filtering to achieve accurate and reliable gesture recognition under different usage conditions. The proposed approach promotes safer and more efficient industrial operations, contributing to research in AR and gesture recognition. Future work will include improving the gesture recognition accuracy, exploring alternative gestures, and expanding the platform integration to improve the user experience.


Augmented Reality , Gestures , Humans , Industry , Software , Pattern Recognition, Automated/methods , User-Computer Interface
8.
Sensors (Basel) ; 24(8)2024 Apr 12.
Article En | MEDLINE | ID: mdl-38676108

Egocentric activity recognition is a prominent computer vision task that is based on the use of wearable cameras. Since egocentric videos are captured through the perspective of the person wearing the camera, her/his body motions severely complicate the video content, imposing several challenges. In this work we propose a novel approach for domain-generalized egocentric human activity recognition. Typical approaches use a large amount of training data, aiming to cover all possible variants of each action. Moreover, several recent approaches have attempted to handle discrepancies between domains with a variety of costly and mostly unsupervised domain adaptation methods. In our approach we show that through simple manipulation of available source domain data and with minor involvement from the target domain, we are able to produce robust models, able to adequately predict human activity in egocentric video sequences. To this end, we introduce a novel three-stream deep neural network architecture combining elements of vision transformers and residual neural networks which are trained using multi-modal data. We evaluate the proposed approach using a challenging, egocentric video dataset and demonstrate its superiority over recent, state-of-the-art research works.


Neural Networks, Computer , Video Recording , Humans , Video Recording/methods , Algorithms , Pattern Recognition, Automated/methods , Image Processing, Computer-Assisted/methods , Human Activities , Wearable Electronic Devices
9.
Sensors (Basel) ; 24(8)2024 Apr 14.
Article En | MEDLINE | ID: mdl-38676137

Human action recognition (HAR) is growing in machine learning with a wide range of applications. One challenging aspect of HAR is recognizing human actions while playing music, further complicated by the need to recognize the musical notes being played. This paper proposes a deep learning-based method for simultaneous HAR and musical note recognition in music performances. We conducted experiments on Morin khuur performances, a traditional Mongolian instrument. The proposed method consists of two stages. First, we created a new dataset of Morin khuur performances. We used motion capture systems and depth sensors to collect data that includes hand keypoints, instrument segmentation information, and detailed movement information. We then analyzed RGB images, depth images, and motion data to determine which type of data provides the most valuable features for recognizing actions and notes in music performances. The second stage utilizes a Spatial Temporal Attention Graph Convolutional Network (STA-GCN) to recognize musical notes as continuous gestures. The STA-GCN model is designed to learn the relationships between hand keypoints and instrument segmentation information, which are crucial for accurate recognition. Evaluation on our dataset demonstrates that our model outperforms the traditional ST-GCN model, achieving an accuracy of 81.4%.


Deep Learning , Music , Humans , Neural Networks, Computer , Human Activities , Pattern Recognition, Automated/methods , Gestures , Algorithms , Movement/physiology
10.
Sensors (Basel) ; 24(8)2024 Apr 18.
Article En | MEDLINE | ID: mdl-38676207

Teaching gesture recognition is a technique used to recognize the hand movements of teachers in classroom teaching scenarios. This technology is widely used in education, including for classroom teaching evaluation, enhancing online teaching, and assisting special education. However, current research on gesture recognition in teaching mainly focuses on detecting the static gestures of individual students and analyzing their classroom behavior. To analyze the teacher's gestures and mitigate the difficulty of single-target dynamic gesture recognition in multi-person teaching scenarios, this paper proposes skeleton-based teaching gesture recognition (ST-TGR), which learns through spatio-temporal representation. This method mainly uses the human pose estimation technique RTMPose to extract the coordinates of the keypoints of the teacher's skeleton and then inputs the recognized sequence of the teacher's skeleton into the MoGRU action recognition network for classifying gesture actions. The MoGRU action recognition module mainly learns the spatio-temporal representation of target actions by stacking a multi-scale bidirectional gated recurrent unit (BiGRU) and using improved attention mechanism modules. To validate the generalization of the action recognition network model, we conducted comparative experiments on datasets including NTU RGB+D 60, UT-Kinect Action3D, SBU Kinect Interaction, and Florence 3D. The results indicate that, compared with most existing baseline models, the model proposed in this article exhibits better performance in recognition accuracy and speed.


Gestures , Humans , Pattern Recognition, Automated/methods , Algorithms , Teaching
11.
Article En | MEDLINE | ID: mdl-38598402

Canonical correlation analysis (CCA), Multivariate synchronization index (MSI), and their extended methods have been widely used for target recognition in Brain-computer interfaces (BCIs) based on Steady State Visual Evoked Potentials (SSVEP), and covariance calculation is an important process for these algorithms. Some studies have proved that embedding time-local information into the covariance can optimize the recognition effect of the above algorithms. However, the optimization effect can only be observed from the recognition results and the improvement principle of time-local information cannot be explained. Therefore, we propose a time-local weighted transformation (TT) recognition framework that directly embeds the time-local information into the electroencephalography signal through weighted transformation. The influence mechanism of time-local information on the SSVEP signal can then be observed in the frequency domain. Low-frequency noise is suppressed on the premise of sacrificing part of the SSVEP fundamental frequency energy, the harmonic energy of SSVEP is enhanced at the cost of introducing a small amount of high-frequency noise. The experimental results show that the TT recognition framework can significantly improve the recognition ability of the algorithms and the separability of extracted features. Its enhancement effect is significantly better than the traditional time-local covariance extraction method, which has enormous application potential.


Brain-Computer Interfaces , Humans , Evoked Potentials, Visual , Pattern Recognition, Automated/methods , Recognition, Psychology , Electroencephalography/methods , Algorithms , Photic Stimulation
12.
Asian Pac J Cancer Prev ; 25(4): 1265-1270, 2024 Apr 01.
Article En | MEDLINE | ID: mdl-38679986

PURPOSE: This study aims to compare the accuracy of the ADNEX MR scoring system and pattern recognition system to evaluate adnexal lesions indeterminate on the US exam. METHODS: In this cross-sectional retrospective study, pelvic DCE-MRI of 245 patients with 340 adnexal masses was studied based on the ADNEX MR scoring system and pattern recognition system. RESULTS: ADNEX MR scoring system with a sensitivity of 96.6% and specificity of 91% has an accuracy of 92.9%. The pattern recognition system's sensitivity, specificity, and accuracy are 95.8%, 93.3%, and 94.7%, respectively. PPV and NPV for the ADNEX MR scoring system were 85.1 and 98.1, respectively. PPV and NPV for the pattern recognition system were 89.7% and 97.7%, respectively. The area under the ROC curve for the ADNEX MR scoring system and pattern recognition system is 0.938 (95% CI, 0.909-0.967) and 0.950 (95% CI, 0.922-0.977). Pairwise comparison of these AUCs showed no significant difference (p = 0.052). CONCLUSION: The pattern recognition system is less sensitive than the ADNEX MR scoring system, yet more specific.


Adnexal Diseases , Magnetic Resonance Imaging , Humans , Female , Cross-Sectional Studies , Retrospective Studies , Middle Aged , Adnexal Diseases/diagnostic imaging , Adnexal Diseases/pathology , Adnexal Diseases/diagnosis , Adult , Magnetic Resonance Imaging/methods , Aged , Prognosis , ROC Curve , Follow-Up Studies , Adolescent , Young Adult , Pattern Recognition, Automated/methods , Adnexa Uteri/pathology , Adnexa Uteri/diagnostic imaging
13.
PLoS One ; 19(4): e0301093, 2024.
Article En | MEDLINE | ID: mdl-38662662

Feature enhancement plays a crucial role in improving the quality and discriminative power of features used in matching tasks. By enhancing the informative and invariant aspects of features, the matching process becomes more robust and reliable, enabling accurate predictions even in challenging scenarios, such as occlusion and reflection in stereo matching. In this paper, we propose an end-to-end dual-dimension feature modulation network called DFMNet to address the issue of mismatches in interference areas. DFMNet utilizes dual-dimension feature modulation (DFM) to capture spatial and channel information separately. This approach enables the adaptive combination of local features with more extensive contextual information, resulting in an enhanced feature representation that is more effective in dealing with challenging scenarios. Additionally, we introduce the concept of cost filter volume (CFV) by utilizing guide weights derived from group-wise correlation. CFV aids in filtering the concatenated volume adaptively, effectively discarding redundant information, and further improving matching accuracy. To enable real-time performance, we designed a fast version named Fast-GFM. Fast-GFM employs the global feature modulation (GFM) block to enhance the feature expression ability, improving the accuracy and stereo matching robustness. The accurate DFMNet and the real-time Fast-GFM achieve state-of-the-art performance across multiple benchmarks, including Scene Flow, KITTI, ETH3D, and Middlebury. These results demonstrate the effectiveness of our proposed methods in enhancing feature representation and significantly improving matching accuracy in various stereo matching scenarios.


Algorithms , Neural Networks, Computer , Humans , Pattern Recognition, Automated/methods
14.
PLoS One ; 19(4): e0298699, 2024.
Article En | MEDLINE | ID: mdl-38574042

Sign language recognition presents significant challenges due to the intricate nature of hand gestures and the necessity to capture fine-grained details. In response to these challenges, a novel approach is proposed-Lightweight Attentive VGG16 with Random Forest (LAVRF) model. LAVRF introduces a refined adaptation of the VGG16 model integrated with attention modules, complemented by a Random Forest classifier. By streamlining the VGG16 architecture, the Lightweight Attentive VGG16 effectively manages complexity while incorporating attention mechanisms that dynamically concentrate on pertinent regions within input images, resulting in enhanced representation learning. Leveraging the Random Forest classifier provides notable benefits, including proficient handling of high-dimensional feature representations, reduction of variance and overfitting concerns, and resilience against noisy and incomplete data. Additionally, the model performance is further optimized through hyperparameter optimization, utilizing the Optuna in conjunction with hill climbing, which efficiently explores the hyperparameter space to discover optimal configurations. The proposed LAVRF model demonstrates outstanding accuracy on three datasets, achieving remarkable results of 99.98%, 99.90%, and 100% on the American Sign Language, American Sign Language with Digits, and NUS Hand Posture datasets, respectively.


Random Forest , Sign Language , Humans , Pattern Recognition, Automated/methods , Gestures , Upper Extremity
15.
Article En | MEDLINE | ID: mdl-38683719

To overcome the challenges posed by the complex structure and large parameter requirements of existing classification models, the authors propose an improved extreme learning machine (ELM) classifier for human locomotion intent recognition in this study, resulting in enhanced classification accuracy. The structure of the ELM algorithm is enhanced using the logistic regression (LR) algorithm, significantly reducing the number of hidden layer nodes. Hence, this algorithm can be adopted for real-time human locomotion intent recognition on portable devices with only 234 parameters to store. Additionally, a hybrid grey wolf optimization and slime mould algorithm (GWO-SMA) is proposed to optimize the hidden layer bias of the improved ELM classifier. Numerical results demonstrate that the proposed model successfully recognizes nine daily motion modes including low-, mid-, and fast-speed level ground walking, ramp ascent/descent, sit/stand, and stair ascent/descent. Specifically, it achieves 96.75% accuracy with 5-fold cross-validation while maintaining a real-time prediction time of only 2 ms. These promising findings highlight the potential of onboard real-time recognition of continuous locomotion modes based on our model for the high-level control of powered knee prostheses.


Algorithms , Amputees , Intention , Knee Prosthesis , Machine Learning , Humans , Amputees/rehabilitation , Male , Logistic Models , Locomotion/physiology , Walking , Femur , Pattern Recognition, Automated/methods , Adult
16.
Sensors (Basel) ; 24(6)2024 Mar 20.
Article En | MEDLINE | ID: mdl-38544240

Radio frequency (RF) technology has been applied to enable advanced behavioral sensing in human-computer interaction. Due to its device-free sensing capability and wide availability on Internet of Things devices. Enabling finger gesture-based identification with high accuracy can be challenging due to low RF signal resolution and user heterogeneity. In this paper, we propose MeshID, a novel RF-based user identification scheme that enables identification through finger gestures with high accuracy. MeshID significantly improves the sensing sensitivity on RF signal interference, and hence is able to extract subtle individual biometrics through velocity distribution profiling (VDP) features from less-distinct finger motions such as drawing digits in the air. We design an efficient few-shot model retraining framework based on first component reverse module, achieving high model robustness and performance in a complex environment. We conduct comprehensive real-world experiments and the results show that MeshID achieves a user identification accuracy of 95.17% on average in three indoor environments. The results indicate that MeshID outperforms the state-of-the-art in identification performance with less cost.


Algorithms , Gestures , Humans , Pattern Recognition, Automated/methods , Fingers , Motion
17.
J Ultrasound Med ; 43(6): 1025-1036, 2024 Jun.
Article En | MEDLINE | ID: mdl-38400537

OBJECTIVES: To complete the task of automatic recognition and classification of thyroid nodules and solve the problem of high classification error rates when the samples are imbalanced. METHODS: An improved k-nearest neighbor (KNN) algorithm is proposed and a method for automatic thyroid nodule classification based on the improved KNN algorithm is established. In the improved KNN algorithm, we consider not only the number of class labels for various classes of data in KNNs, but also the corresponding weights. And we use the Minkowski distance measure instead of the Euclidean distance measure. RESULTS: A total of 508 ultrasound images of thyroid nodules, including 415 benign nodules and 93 malignant nodules, were used in the paper. Experimental results show the improved KNN has 0.872549 accuracy, 0.867347 precision, 1 recall, and 0.928962 F1-score. At the same time, we also considered the influence of different distance weights, the value of k, different distance measures on the classification results. CONCLUSIONS: A comparison result shows that our method has a better performance than the traditional KNN and other classical machine learning methods.


Algorithms , Thyroid Nodule , Ultrasonography , Thyroid Nodule/diagnostic imaging , Thyroid Nodule/classification , Humans , Ultrasonography/methods , Reproducibility of Results , Thyroid Gland/diagnostic imaging , Image Interpretation, Computer-Assisted/methods , Pattern Recognition, Automated/methods
18.
Med Biol Eng Comput ; 62(6): 1911-1924, 2024 Jun.
Article En | MEDLINE | ID: mdl-38413518

Micro-expressions (MEs) play such an important role in predicting a person's genuine emotions, as to make micro-expression recognition such an important resea rch focus in recent years. Most recent researchers have made efforts to recognize MEs with spatial and temporal information of video clips. However, because of their short duration and subtle intensity, capturing spatio-temporal features of micro-expressions remains challenging. To effectively promote the recognition performance, this paper presents a novel paralleled dual-branch attention-based spatio-temporal fusion network (PASTFNet). We jointly extract short- and long-range spatial relationships in spatial branch. Inspired by the composite architecture of the convolutional neural network (CNN) and long short-term memory (LSTM) for temporal modeling, we propose a novel attention-based multi-scale feature fusion network (AMFNet) to encode features of sequential frames, which can learn more expressive facial-detailed features for it implements the integrated use of attention and multi-scale feature fusion, then design an aggregation block to aggregate and acquire temporal features. At last, the features learned by the above two branches are fused to accomplish expression recognition with outstanding effect. Experiments on two MER datasets (CASMEII and SAMM) show that the PASTFNet model achieves promising ME recognition performance compared with other methods.


Neural Networks, Computer , Humans , Attention/physiology , Facial Expression , Emotions/physiology , Algorithms , Image Processing, Computer-Assisted/methods , Pattern Recognition, Automated/methods
19.
Sensors (Basel) ; 24(3)2024 Jan 26.
Article En | MEDLINE | ID: mdl-38339542

Japanese Sign Language (JSL) is vital for communication in Japan's deaf and hard-of-hearing community. But probably because of the large number of patterns, 46 types, there is a mixture of static and dynamic, and the dynamic ones have been excluded in most studies. Few researchers have been working to develop a dynamic JSL alphabet, and their performance accuracy is unsatisfactory. We proposed a dynamic JSL recognition system using effective feature extraction and feature selection approaches to overcome the challenges. In the procedure, we follow the hand pose estimation, effective feature extraction, and machine learning techniques. We collected a video dataset capturing JSL gestures through standard RGB cameras and employed MediaPipe for hand pose estimation. Four types of features were proposed. The significance of these features is that the same feature generation method can be used regardless of the number of frames or whether the features are dynamic or static. We employed a Random forest (RF) based feature selection approach to select the potential feature. Finally, we fed the reduced features into the kernels-based Support Vector Machine (SVM) algorithm classification. Evaluations conducted on our proprietary newly created dynamic Japanese sign language alphabet dataset and LSA64 dynamic dataset yielded recognition accuracies of 97.20% and 98.40%, respectively. This innovative approach not only addresses the complexities of JSL but also holds the potential to bridge communication gaps, offering effective communication for the deaf and hard-of-hearing, and has broader implications for sign language recognition systems globally.


Pattern Recognition, Automated , Sign Language , Humans , Japan , Pattern Recognition, Automated/methods , Hand , Algorithms , Gestures
20.
Sensors (Basel) ; 24(3)2024 Jan 31.
Article En | MEDLINE | ID: mdl-38339637

Surface electromyogram (sEMG)-based gesture recognition has emerged as a promising avenue for developing intelligent prostheses for upper limb amputees. However, the temporal variations in sEMG have rendered recognition models less efficient than anticipated. By using cross-session calibration and increasing the amount of training data, it is possible to reduce these variations. The impact of varying the amount of calibration and training data on gesture recognition performance for amputees is still unknown. To assess these effects, we present four datasets for the evaluation of calibration data and examine the impact of the amount of training data on benchmark performance. Two amputees who had undergone amputations years prior were recruited, and seven sessions of data were collected for analysis from each of them. Ninapro DB6, a publicly available database containing data from ten healthy subjects across ten sessions, was also included in this study. The experimental results show that the calibration data improved the average accuracy by 3.03%, 6.16%, and 9.73% for the two subjects and Ninapro DB6, respectively, compared to the baseline results. Moreover, it was discovered that increasing the number of training sessions was more effective in improving accuracy than increasing the number of trials. Three potential strategies are proposed in light of these findings to enhance cross-session models further. We consider these findings to be of the utmost importance for the commercialization of intelligent prostheses, as they demonstrate the criticality of gathering calibration and cross-session training data, while also offering effective strategies to maximize the utilization of the entire dataset.


Amputees , Artificial Limbs , Humans , Electromyography/methods , Calibration , Pattern Recognition, Automated/methods , Upper Extremity , Algorithms
...